fix(qemu): improve VM shutdown with graceful timeouts and PID safety#2479
Merged
fix(qemu): improve VM shutdown with graceful timeouts and PID safety#2479
Conversation
- Suppress expected ExitMissingError logs when VM powers off abruptly during SSH shutdown - Add graceful multi-stage shutdown: wait 5s for process to exit, then SIGTERM + 5s, then SIGKILL - Store *os.Process instead of raw PID to eliminate accidental signal delivery to reused PIDs - Guarantee QEMU process exits cleanly before returning from TerminatePod Fixes the race condition where libvterm builds would show spurious ERRO/WARN messages, while also making shutdown more robust and safe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
egibs
approved these changes
Apr 13, 2026
smoser
added a commit
to smoser/melange
that referenced
this pull request
Apr 14, 2026
…lled We should not train people or machines to ignore red ERROR messages. With this change and chainguard-dev#2479, we have zero ERROR log entries in a successful build. Previously RetrieveObservabilityEvents always sent three `test -f` SSH commands to probe for the observability events file, even when the hook was never installed. Each probe exits non-zero (file not found), causing sendSSHCommand to log ERROR three times for every build to the console. During CPIO generation, scan the base initramfs for the hook's sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and record the result in cfg.ObservabilityHook. This is accurate regardless of how the package got into the image — QEMU_ADDITIONAL_PACKAGES, QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents returns immediately when ObservabilityHook is false, and treats a missing events file as an error when it is true. We can now also correctly ERROR when there _was_ a observability hook installed rather than just assuming it was not there. Store the result of that scan in a sidecar (<cpio>.observability) so we do not have to scan on cached initramfs. The sidecar is invalidated automatically when the CPIO is newer (fresh build, QEMU_ADDITIONAL_PACKAGES change, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
smoser
added a commit
to smoser/melange
that referenced
this pull request
Apr 14, 2026
…lled We should not train people or machines to ignore red ERROR messages. With this change and chainguard-dev#2479, we have zero ERROR log entries in a successful build. Previously RetrieveObservabilityEvents always sent three `test -f` SSH commands to probe for the observability events file, even when the hook was never installed. Each probe exits non-zero (file not found), causing sendSSHCommand to log ERROR three times for every build to the console. During CPIO generation, scan the base initramfs for the hook's sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and record the result in cfg.ObservabilityHook. This is accurate regardless of how the package got into the image — QEMU_ADDITIONAL_PACKAGES, QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents returns immediately when ObservabilityHook is false, and treats a missing events file as an error when it is true. We can now also correctly ERROR when there _was_ a observability hook installed rather than just assuming it was not there. Store the result of that scan in a sidecar (<cpio>.observability) so we do not have to scan on cached initramfs. The sidecar is invalidated automatically when the CPIO is newer (fresh build, QEMU_ADDITIONAL_PACKAGES change, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
smoser
added a commit
to smoser/melange
that referenced
this pull request
Apr 14, 2026
We should not train people or machines to ignore red ERROR messages. With this change and chainguard-dev#2479, we have zero ERROR log entries in a successful build. Previously RetrieveObservabilityEvents always sent three `test -f` SSH commands to probe for the observability events file, even when the hook was never installed. Each probe exits non-zero (file not found), causing sendSSHCommand to log ERROR three times for every build to the console. During CPIO generation, scan the base initramfs for the hook's sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and record the result in cfg.ObservabilityHook. This is accurate regardless of how the package got into the image — QEMU_ADDITIONAL_PACKAGES, QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents returns immediately when ObservabilityHook is false, and treats a missing events file as an error when it is true. We can now also correctly ERROR when there _was_ a observability hook installed rather than just assuming it was not there. Store the result of that scan in a sidecar (<cpio>.observability) so we do not have to scan on cached initramfs. The sidecar is invalidated automatically when the CPIO is newer (fresh build, QEMU_ADDITIONAL_PACKAGES change, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
smoser
added a commit
that referenced
this pull request
Apr 14, 2026
We should not train people or machines to ignore red ERROR messages. With this change and #2479, we have zero ERROR log entries in a successful build. Previously RetrieveObservabilityEvents always sent three `test -f` SSH commands to probe for the observability events file, even when the hook was never installed. Each probe exits non-zero (file not found), causing sendSSHCommand to log ERROR three times for every build to the console. During CPIO generation, scan the base initramfs for the hook's sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and record the result in cfg.ObservabilityHook. This is accurate regardless of how the package got into the image — QEMU_ADDITIONAL_PACKAGES, QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents returns immediately when ObservabilityHook is false, and treats a missing events file as an error when it is true. We can now also correctly ERROR when there _was_ a observability hook installed rather than just assuming it was not there. Store the result of that scan in a sidecar (<cpio>.observability) so we do not have to scan on cached initramfs. The sidecar is invalidated automatically when the CPIO is newer (fresh build, QEMU_ADDITIONAL_PACKAGES change, etc.). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the race condition where libvterm builds would show spurious ERRO/WARN messages, while also making shutdown more robust and safe.